library(topGO)
library(knitr) # to use kable()
library(limma) # to use vennDiagram()
library(ggplot2)
library(variancePartition)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
isoforms = '../2019-07-26/transcripts/annotation.txt')
DGEDIR <- '../2020-01-08/'
ENRICHMENT <- paste('../2020-01-14/Enrichment', TAG, VAR, 'RData', sep='.')
This is the enrichment analysis for genes regulated by regime. I am using the topGO package (Alexa and Rahnenfuhrer 2019), which is able to apply different algorithms. The starting point is an ordering of genes usually according to their association with a phenotype or any other relevant quantity, such as a measure of differential expression between two conditions. In this case, we order genes by their differential expression between regime levels. Following one of the suggestions in the manual, I originally ordered genes by either \(p\) value of the differential expression analysis, or by the (complement of the) amount of its expression variance explained by regime (see 2020-01-14). Those were quite equivalent orderings. However, they do not distinguish the sign of the fold change: both up- and down-regulated genes contribute to make a gene ontology term significant. It is one of the approaches recommended, and it makes sense, to the extent that among all the genes annotated with a function, some may display contrasting expression patterns, for example if both positive and negative regulators of that function are annotated with the same label. I thus, expect the use of \(p\) values in enrichment analysis to facilitate the detection of high level categories.
In contrast, genes sharing lower level categories (very specific GO terms) are more likely to be expressed in a similar way. Then, an ordering of genes that reflect the sign of the difference in expression levels between the two conditions compared would be more informative. In principle, enrichment methods should be able to detect an enrichment in either end of the list, rather than only in the top. That is actually the case for tests based on the Kolmogorov-Smirnov distribution, like GSEA (Subramanian et al., n.d.). I actually checked that reversing the order of genes produces the exact same results, as expected. Below, I use the Kolmogorov-Smirnov test.
For gene ordering, I use the \(t\) statistic of the differential expression analysis, which is positive when expression is higher in the random than in the regular regime, and vice versa.
Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.
load(paste0(DGEDIR, TAG, '.RData'))
tagTStat <- topTable(fitmm, coef = VAR, num = length(fitmm$F), sort.by='t', resort.by='t')[,'t',drop=FALSE]
annotation <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))
To initialize the topGOdata object, I need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, the \(t\) statistics from the differential expression analysis. The structure() function adds attributes to an object.
TStats <- structure(tagTStat[,1], names = row.names(tagTStat))
rm(tagTStat)
There are 18598 genes scored with a \(t\) statistic. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.
head(annotation)
## tagname goterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)
There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.
for (tag in names(TStats)) {
if (! tag %in% names(allgenes2GO)) {
allgenes2GO <- append(allgenes2GO,
structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
}
}
Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.
I generate three datasets, to analyse each of the three ontologies.
selection <- function(allScores) {return(abs(allScores) > 5.0)}
GOdata.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = TStats,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 15162 | 1154 |
| MF | 17429 | 576 |
| CC | 12940 | 291 |
There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.
BP.elim <- runTest(GOdata.BP, algorithm = "elim", statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea <- runTest(GOdata.BP, algorithm = "lea", statistic = "ks")
MF.elim <- runTest(GOdata.MF, algorithm = "elim", statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea <- runTest(GOdata.MF, algorithm = "lea", statistic = "ks")
CC.elim <- runTest(GOdata.CC, algorithm = "elim", statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea <- runTest(GOdata.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1154 | 31 |
| BP | weight01 | 1154 | 32 |
| BP | lea | 1154 | 46 |
| MF | elim | 576 | 47 |
| MF | weight01 | 576 | 42 |
| MF | lea | 576 | 68 |
| CC | elim | 291 | 17 |
| CC | weight01 | 291 | 17 |
| CC | lea | 291 | 30 |
rm(ResultsSummary)
I don’t see a way to distinguish GO terms that are significant because they tend to appear near the top of the list of genes from those that are significant because they tend to appear near the bottom. The density plots are useful, but take too much space to show them for all terms.
Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.
orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.tstat.sigTerms <- sigTerms
BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BP.all[BP.all$Significant > BP.all$Expected,],
caption = "Most over-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 2 | GO:0006508 | proteolysis | 442 | 10 | 6.24 | 3 | 8.1e-10 | 8.4e-19 | 8.1e-10 |
| 4 | GO:0006030 | chitin metabolic process | 74 | 4 | 1.04 | 4 | 1.2e-09 | 3.0e-09 | 1.2e-09 |
| 6 | GO:0034220 | ion transmembrane transport | 147 | 3 | 2.07 | 10 | 2.6e-05 | 1.9e-07 | 2.2e-08 |
| 7 | GO:0055085 | transmembrane transport | 500 | 8 | 7.06 | 7 | 1.2e-06 | 4.2e-07 | 6.9e-13 |
| 9 | GO:0005975 | carbohydrate metabolic process | 202 | 10 | 2.85 | 18 | 0.00121 | 3.3e-06 | 0.00024 |
| 15 | GO:0007160 | cell-matrix adhesion | 16 | 1 | 0.23 | 14 | 0.00020 | 0.00020 | 0.00020 |
| 19 | GO:0005992 | trehalose biosynthetic process | 6 | 2 | 0.08 | 17 | 0.00110 | 0.00110 | 0.00110 |
| 22 | GO:0007017 | microtubule-based process | 195 | 4 | 2.75 | 148 | 0.09189 | 0.00189 | 2.5e-06 |
| 23 | GO:0006629 | lipid metabolic process | 195 | 4 | 2.75 | 678 | 0.82452 | 0.00228 | 0.82452 |
| 25 | GO:0006520 | cellular amino acid metabolic process | 117 | 3 | 1.65 | 286 | 0.26064 | 0.00374 | 0.32073 |
kable(
BP.all[BP.all$Significant < BP.all$Expected,],
caption = "Most under-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0007186 | G protein-coupled receptor signaling pat… | 206 | 2 | 2.91 | 1 | 1.3e-26 | 4.4e-27 | 1.3e-26 |
| 3 | GO:0006814 | sodium ion transport | 88 | 0 | 1.24 | 2 | 5.8e-10 | 5.8e-10 | 5.8e-10 |
| 5 | GO:0006813 | potassium ion transport | 56 | 0 | 0.79 | 5 | 1.4e-07 | 1.4e-07 | 8.8e-10 |
| 8 | GO:0006486 | protein glycosylation | 75 | 0 | 1.06 | 11 | 0.00011 | 6.6e-07 | 0.00011 |
| 10 | GO:0060271 | cilium assembly | 46 | 0 | 0.65 | 6 | 9.5e-07 | 7.6e-06 | 1.4e-08 |
| 11 | GO:0015074 | DNA integration | 34 | 0 | 0.48 | 8 | 1.5e-05 | 1.5e-05 | 1.5e-05 |
| 12 | GO:0006811 | ion transport | 436 | 5 | 6.15 | 12 | 0.00014 | 2.6e-05 | 7.0e-19 |
| 13 | GO:0007165 | signal transduction | 544 | 7 | 7.68 | 23 | 0.00698 | 0.00013 | 0.00698 |
| 14 | GO:0007156 | homophilic cell adhesion via plasma memb… | 21 | 0 | 0.30 | 13 | 0.00015 | 0.00015 | 0.00015 |
| 16 | GO:0003341 | cilium movement | 10 | 0 | 0.14 | 15 | 0.00026 | 0.00026 | 0.00026 |
| 17 | GO:0006355 | regulation of transcription, DNA-templat… | 355 | 2 | 5.01 | 176 | 0.12252 | 0.00060 | 0.12252 |
| 18 | GO:0070588 | calcium ion transmembrane transport | 30 | 0 | 0.42 | 16 | 0.00101 | 0.00101 | 0.00101 |
| 20 | GO:0007154 | cell communication | 565 | 7 | 7.97 | 9 | 1.9e-05 | 0.00132 | 0.00016 |
| 21 | GO:0048870 | cell motility | 14 | 0 | 0.20 | 22 | 0.00581 | 0.00158 | 0.00581 |
| 24 | GO:0007018 | microtubule-based movement | 115 | 1 | 1.62 | 20 | 0.00303 | 0.00364 | 6.4e-05 |
| 26 | GO:0007166 | cell surface receptor signaling pathway | 72 | 1 | 1.02 | 293 | 0.26315 | 0.00387 | 0.26315 |
| 27 | GO:0006820 | anion transport | 81 | 1 | 1.14 | 74 | 0.02458 | 0.00536 | 0.02458 |
| 28 | GO:0022607 | cellular component assembly | 217 | 0 | 3.06 | 986 | 0.97883 | 0.00539 | 0.97305 |
| 29 | GO:0071805 | potassium ion transmembrane transport | 19 | 0 | 0.27 | 19 | 0.00124 | 0.00585 | 0.00124 |
| 30 | GO:0006313 | transposition, DNA-mediated | 10 | 0 | 0.14 | 25 | 0.00782 | 0.00782 | 0.00782 |
| 31 | GO:0007015 | actin filament organization | 31 | 0 | 0.44 | 225 | 0.17953 | 0.00802 | 0.01389 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with',
VAR, 'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0007186 | G protein-coupled receptor signaling pathway | A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). | 0.0000000 |
| GO:0006508 | proteolysis | The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. | 0.0000000 |
| GO:0006814 | sodium ion transport | The directed movement of sodium ions (Na+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000000 |
| GO:0006030 | chitin metabolic process | The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0000000 |
| GO:0006813 | potassium ion transport | The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000001 |
| GO:0034220 | ion transmembrane transport | A process in which an ion is transported across a membrane. | 0.0000002 |
| GO:0055085 | transmembrane transport | The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. | 0.0000004 |
| GO:0006486 | protein glycosylation | A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. | 0.0000007 |
| GO:0005975 | carbohydrate metabolic process | The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y. Includes the formation of carbohydrate derivatives by the addition of a carbohydrate residue to another molecule. | 0.0000033 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0000076 |
| GO:0015074 | DNA integration | The process in which a segment of DNA is incorporated into another, usually larger, DNA molecule such as a chromosome. | 0.0000147 |
| GO:0006811 | ion transport | The directed movement of charged atoms or small charged molecules into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. | 0.0000263 |
| GO:0007165 | signal transduction | The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. | 0.0001347 |
| GO:0007156 | homophilic cell adhesion via plasma membrane adhesion molecules | The attachment of a plasma membrane adhesion molecule in one cell to an identical molecule in an adjacent cell. | 0.0001501 |
| GO:0007160 | cell-matrix adhesion | The binding of a cell to the extracellular matrix via adhesion molecules. | 0.0001997 |
| GO:0003341 | cilium movement | The directed, self-propelled movement of a cilium. | 0.0002568 |
| GO:0070588 | calcium ion transmembrane transport | A process in which a calcium ion is transported from one side of a membrane to the other by means of some agent such as a transporter or pore. | 0.0010143 |
| GO:0005992 | trehalose biosynthetic process | The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. | 0.0010957 |
| GO:0007154 | cell communication | Any process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. | 0.0013231 |
| GO:0048870 | cell motility | Any process involved in the controlled self-propelled movement of a cell that results in translocation of the cell from one place to another. | 0.0015760 |
| GO:0007018 | microtubule-based movement | A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. | 0.0036411 |
| GO:0071805 | potassium ion transmembrane transport | A process in which a potassium ion is transported from one side of a membrane to the other. | 0.0058542 |
| GO:0006313 | transposition, DNA-mediated | Any process involved in a type of transpositional recombination which occurs via a DNA intermediate. | 0.0078178 |
| GO:0070286 | axonemal dynein complex assembly | The aggregation, arrangement and bonding together of a set of components to form an axonemal dynein complex, a dynein complex found in eukaryotic cilia and flagella, in which the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. | 0.0099394 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOdata.BP, score(BP.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 155
## Number of Edges = 276
##
## $complete.dag
## [1] "A graph with 155 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.tstat.sigTerms <- sigTerms
MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MF.all[MF.all$Significant > MF.all$Expected,],
caption = "Most over-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | GO:0008061 | chitin binding | 77 | 4 | 1.13 | 3 | 1.1e-09 | 1.1e-09 | 1.1e-09 |
| 6 | GO:0004252 | serine-type endopeptidase activity | 88 | 4 | 1.29 | 6 | 2.3e-08 | 2.3e-08 | 2.3e-08 |
| 14 | GO:0005200 | structural constituent of cytoskeleton | 17 | 2 | 0.25 | 13 | 0.00023 | 0.00023 | 0.00023 |
| 19 | GO:0030248 | cellulose binding | 7 | 1 | 0.10 | 21 | 0.00090 | 0.00090 | 0.00090 |
| 20 | GO:0020037 | heme binding | 95 | 3 | 1.40 | 23 | 0.00112 | 0.00112 | 0.00112 |
| 21 | GO:0016715 | oxidoreductase activity, acting on paire… | 14 | 3 | 0.21 | 24 | 0.00123 | 0.00123 | 0.00123 |
| 23 | GO:0031409 | pigment binding | 10 | 1 | 0.15 | 28 | 0.00172 | 0.00172 | 0.00172 |
| 24 | GO:0030246 | carbohydrate binding | 51 | 2 | 0.75 | 39 | 0.00644 | 0.00184 | 0.00019 |
| 28 | GO:0004601 | peroxidase activity | 34 | 2 | 0.50 | 31 | 0.00300 | 0.00300 | 2.8e-05 |
| 30 | GO:0008233 | peptidase activity | 399 | 8 | 5.86 | 304 | 0.64934 | 0.00375 | 0.96640 |
| 39 | GO:0004553 | hydrolase activity, hydrolyzing O-glycos… | 100 | 4 | 1.47 | 26 | 0.00164 | 0.00806 | 0.00164 |
| 42 | GO:0008017 | microtubule binding | 62 | 1 | 0.91 | 47 | 0.00959 | 0.00959 | 0.00959 |
| 47 | GO:0052689 | carboxylic ester hydrolase activity | 43 | 1 | 0.63 | 54 | 0.01154 | 0.01226 | 0.01154 |
kable(
MF.all[MF.all$Significant < MF.all$Expected,],
caption = "Most under-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0005509 | calcium ion binding | 361 | 3 | 5.30 | 1 | < 1e-30 | < 1e-30 | < 1e-30 |
| 2 | GO:0004930 | G protein-coupled receptor activity | 180 | 2 | 2.64 | 2 | 5.7e-23 | 9.3e-22 | 1.3e-26 |
| 3 | GO:0043565 | sequence-specific DNA binding | 146 | 0 | 2.14 | 4 | 2.0e-09 | 1.4e-11 | 2.0e-09 |
| 5 | GO:0005272 | sodium channel activity | 73 | 0 | 1.07 | 5 | 7.0e-09 | 7.0e-09 | 7.0e-09 |
| 7 | GO:0003774 | motor activity | 131 | 1 | 1.92 | 7 | 1.3e-07 | 1.3e-07 | 2.9e-07 |
| 8 | GO:0004181 | metallocarboxypeptidase activity | 19 | 0 | 0.28 | 8 | 2.2e-07 | 2.2e-07 | 2.2e-07 |
| 9 | GO:0005515 | protein binding | 2101 | 30 | 30.86 | 17 | 0.00070 | 1.4e-06 | 0.00128 |
| 10 | GO:0005249 | voltage-gated potassium channel activity | 27 | 0 | 0.40 | 9 | 1.7e-06 | 1.7e-06 | 1.7e-06 |
| 11 | GO:0004888 | transmembrane signaling receptor activit… | 251 | 2 | 3.69 | 11 | 6.3e-06 | 6.1e-06 | 3.6e-30 |
| 12 | GO:0008417 | fucosyltransferase activity | 33 | 0 | 0.48 | 12 | 8.8e-06 | 8.8e-06 | 8.8e-06 |
| 13 | GO:0005230 | extracellular ligand-gated ion channel a… | 54 | 0 | 0.79 | 10 | 1.7e-06 | 1.0e-05 | 9.2e-08 |
| 15 | GO:0004222 | metalloendopeptidase activity | 73 | 1 | 1.07 | 14 | 0.00031 | 0.00031 | 0.00031 |
| 16 | GO:0005328 | neurotransmitter:sodium symporter activi… | 18 | 0 | 0.26 | 15 | 0.00055 | 0.00055 | 0.00055 |
| 17 | GO:0051015 | actin filament binding | 30 | 0 | 0.44 | 16 | 0.00070 | 0.00070 | 0.00070 |
| 18 | GO:0003777 | microtubule motor activity | 98 | 1 | 1.44 | 20 | 0.00085 | 0.00085 | 0.00085 |
| 22 | GO:0004096 | catalase activity | 15 | 0 | 0.22 | 25 | 0.00156 | 0.00156 | 0.00156 |
| 25 | GO:0004983 | neuropeptide Y receptor activity | 9 | 0 | 0.13 | 29 | 0.00188 | 0.00188 | 0.00188 |
| 26 | GO:0015293 | symporter activity | 32 | 0 | 0.47 | 34 | 0.00447 | 0.00238 | 1.6e-05 |
| 27 | GO:0008378 | galactosyltransferase activity | 23 | 0 | 0.34 | 30 | 0.00256 | 0.00256 | 0.00256 |
| 29 | GO:0004190 | aspartic-type endopeptidase activity | 28 | 0 | 0.41 | 32 | 0.00306 | 0.00306 | 0.00306 |
| 31 | GO:0008528 | G protein-coupled peptide receptor activ… | 15 | 0 | 0.22 | 33 | 0.00428 | 0.00428 | 1.6e-05 |
| 32 | GO:0031683 | G-protein beta/gamma-subunit complex bin… | 11 | 0 | 0.16 | 37 | 0.00523 | 0.00523 | 0.00523 |
| 33 | GO:0004089 | carbonate dehydratase activity | 11 | 0 | 0.16 | 38 | 0.00610 | 0.00610 | 0.00610 |
| 34 | GO:0005452 | inorganic anion exchanger activity | 6 | 0 | 0.09 | 40 | 0.00676 | 0.00676 | 0.00676 |
| 35 | GO:0070006 | metalloaminopeptidase activity | 5 | 0 | 0.07 | 41 | 0.00684 | 0.00684 | 0.00684 |
| 36 | GO:0022848 | acetylcholine-gated cation-selective cha… | 16 | 0 | 0.24 | 42 | 0.00687 | 0.00687 | 0.00687 |
| 37 | GO:0008138 | protein tyrosine/serine/threonine phosph… | 23 | 0 | 0.34 | 43 | 0.00749 | 0.00749 | 0.00749 |
| 38 | GO:0030215 | semaphorin receptor binding | 7 | 0 | 0.10 | 44 | 0.00779 | 0.00779 | 0.00779 |
| 40 | GO:0004435 | phosphatidylinositol phospholipase C act… | 6 | 0 | 0.09 | 46 | 0.00834 | 0.00834 | 0.00834 |
| 41 | GO:0016757 | transferase activity, transferring glyco… | 194 | 2 | 2.85 | 335 | 0.71845 | 0.00903 | 0.01913 |
| 43 | GO:0005216 | ion channel activity | 224 | 0 | 3.29 | 27 | 0.00169 | 0.01088 | 1.9e-18 |
| 44 | GO:0019825 | oxygen binding | 8 | 0 | 0.12 | 51 | 0.01146 | 0.01146 | 0.01146 |
| 45 | GO:0004970 | ionotropic glutamate receptor activity | 7 | 0 | 0.10 | 52 | 0.01146 | 0.01146 | 0.01146 |
| 46 | GO:0019205 | nucleobase-containing compound kinase ac… | 26 | 0 | 0.38 | 35 | 0.00461 | 0.01193 | 0.00461 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.weight01)[sigTerms]),
caption = paste('Molecular function terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005509 | calcium ion binding | Interacting selectively and non-covalently with calcium ions (Ca2+). | 0.0000000 |
| GO:0004930 | G protein-coupled receptor activity | Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0000000 |
| GO:0043565 | sequence-specific DNA binding | Interacting selectively and non-covalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. | 0.0000000 |
| GO:0008061 | chitin binding | Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0000000 |
| GO:0005272 | sodium channel activity | Enables the facilitated diffusion of a sodium ion (by an energy-independent process) involving passage through a transmembrane aqueous pore or channel without evidence for a carrier-mediated mechanism. | 0.0000000 |
| GO:0004252 | serine-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). | 0.0000000 |
| GO:0003774 | motor activity | Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. | 0.0000001 |
| GO:0004181 | metallocarboxypeptidase activity | Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0000002 |
| GO:0005515 | protein binding | Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). | 0.0000014 |
| GO:0005249 | voltage-gated potassium channel activity | Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. | 0.0000017 |
| GO:0004888 | transmembrane signaling receptor activity | Combining with an extracellular or intracellular signal and transmitting the signal from one side of the membrane to the other to initiate a change in cell activity or state as part of signal transduction. | 0.0000061 |
| GO:0008417 | fucosyltransferase activity | Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0000088 |
| GO:0005230 | extracellular ligand-gated ion channel activity | Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. | 0.0000100 |
| GO:0005200 | structural constituent of cytoskeleton | The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. | 0.0002305 |
| GO:0004222 | metalloendopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0003142 |
| GO:0005328 | neurotransmitter:sodium symporter activity | Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: neurotransmitter(out) + Na+(out) = neurotransmitter(in) + Na+(in). | 0.0005460 |
| GO:0051015 | actin filament binding | Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. | 0.0006982 |
| GO:0003777 | microtubule motor activity | Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). | 0.0008454 |
| GO:0030248 | cellulose binding | Interacting selectively and non-covalently with cellulose. | 0.0009042 |
| GO:0020037 | heme binding | Interacting selectively and non-covalently with heme, any compound of iron complexed in a porphyrin (tetrapyrrole) ring. | 0.0011192 |
| GO:0016715 | oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen | Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. | 0.0012308 |
| GO:0004096 | catalase activity | Catalysis of the reaction: 2 hydrogen peroxide = O2 + 2 H2O. | 0.0015600 |
| GO:0031409 | pigment binding | Interacting selectively and non-covalently with a pigment, any general or particular coloring matter in living organisms, e.g. melanin. | 0.0017172 |
| GO:0030246 | carbohydrate binding | Interacting selectively and non-covalently with any carbohydrate, which includes monosaccharides, oligosaccharides and polysaccharides as well as substances derived from monosaccharides by reduction of the carbonyl group (alditols), by oxidation of one or more hydroxy groups to afford the corresponding aldehydes, ketones, or carboxylic acids, or by replacement of one or more hydroxy group(s) by a hydrogen atom. Cyclitols are generally not regarded as carbohydrates. | 0.0018445 |
| GO:0004983 | neuropeptide Y receptor activity | Combining with neuropeptide Y to initiate a change in cell activity. | 0.0018782 |
| GO:0015293 | symporter activity | Enables the active transport of a solute across a membrane by a mechanism whereby two or more species are transported together in the same direction in a tightly coupled process not directly linked to a form of energy other than chemiosmotic energy. | 0.0023817 |
| GO:0008378 | galactosyltransferase activity | Catalysis of the transfer of a galactosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0025622 |
| GO:0004601 | peroxidase activity | Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. | 0.0030028 |
| GO:0004190 | aspartic-type endopeptidase activity | Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which a water molecule bound by the side chains of aspartic residues at the active center acts as a nucleophile. | 0.0030648 |
| GO:0008528 | G protein-coupled peptide receptor activity | Combining with a peptide and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. | 0.0042829 |
| GO:0031683 | G-protein beta/gamma-subunit complex binding | Interacting selectively and non-covalently with a complex of G-protein beta/gamma subunits. | 0.0052275 |
| GO:0004089 | carbonate dehydratase activity | Catalysis of the reaction: H2CO3 = CO2 + H2O. | 0.0060993 |
| GO:0005452 | inorganic anion exchanger activity | Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: inorganic anion A(out) + inorganic anion B(in) = inorganic anion A(in) + inorganic anion B(out). | 0.0067585 |
| GO:0070006 | metalloaminopeptidase activity | Catalysis of the hydrolysis of N-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. | 0.0068381 |
| GO:0022848 | acetylcholine-gated cation-selective channel activity | Selectively enables the transmembrane transfer of a cation by a channel that opens upon binding acetylcholine. | 0.0068744 |
| GO:0008138 | protein tyrosine/serine/threonine phosphatase activity | Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. | 0.0074915 |
| GO:0030215 | semaphorin receptor binding | Interacting selectively and non-covalently with semaphorin receptors. | 0.0077865 |
| GO:0004553 | hydrolase activity, hydrolyzing O-glycosyl compounds | Catalysis of the hydrolysis of any O-glycosyl bond. | 0.0080587 |
| GO:0004435 | phosphatidylinositol phospholipase C activity | Catalysis of the reaction: 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate + H(2)O = 1,2-diacylglycerol + 1D-myo-inositol 1,4,5-trisphosphate + H(+). | 0.0083379 |
| GO:0008017 | microtubule binding | Interacting selectively and non-covalently with microtubules, filaments composed of tubulin monomers. | 0.0095925 |
showSigOfNodes(GOdata.MF, score(MF.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 149
## Number of Edges = 194
##
## $complete.dag
## [1] "A graph with 149 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.tstat.sigTerms <- sigTerms
CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
CC.all[CC.all$Significant > CC.all$Expected,],
caption = "Most over-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GO:0016021 | integral component of membrane | 885 | 16 | 11.70 | 1 | < 1e-30 | < 1e-30 | < 1e-30 |
| 2 | GO:0016020 | membrane | 1565 | 23 | 20.68 | 3 | 6.9e-12 | 1.5e-19 | < 1e-30 |
| 3 | GO:0005576 | extracellular region | 140 | 6 | 1.85 | 2 | 4.1e-15 | 7.5e-14 | 1.2e-17 |
| 10 | GO:0005874 | microtubule | 29 | 2 | 0.38 | 11 | 0.00221 | 0.00221 | 0.00221 |
| 12 | GO:0042555 | MCM complex | 10 | 4 | 0.13 | 14 | 0.00302 | 0.00302 | 0.00302 |
| 17 | GO:0005887 | integral component of plasma membrane | 118 | 2 | 1.56 | 13 | 0.00293 | 0.00713 | 0.00293 |
kable(
CC.all[CC.all$Significant < CC.all$Expected,],
caption = "Most under-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | GO:0016459 | myosin complex | 33 | 0 | 0.44 | 4 | 5.2e-08 | 5.2e-08 | 5.2e-08 |
| 5 | GO:0008076 | voltage-gated potassium channel complex | 13 | 0 | 0.17 | 5 | 3.2e-06 | 3.2e-06 | 3.2e-06 |
| 6 | GO:0005886 | plasma membrane | 170 | 2 | 2.25 | 6 | 0.00019 | 0.00016 | 2.1e-12 |
| 7 | GO:0098797 | plasma membrane protein complex | 34 | 0 | 0.45 | 8 | 0.00033 | 0.00049 | 3.7e-09 |
| 8 | GO:0031012 | extracellular matrix | 9 | 0 | 0.12 | 9 | 0.00081 | 0.00081 | 0.00081 |
| 9 | GO:0034704 | calcium channel complex | 5 | 0 | 0.07 | 10 | 0.00185 | 0.00185 | 0.00185 |
| 11 | GO:0005929 | cilium | 27 | 0 | 0.36 | 12 | 0.00279 | 0.00279 | 9.8e-08 |
| 13 | GO:0005858 | axonemal dynein complex | 5 | 0 | 0.07 | 15 | 0.00371 | 0.00371 | 0.00371 |
| 14 | GO:0045211 | postsynaptic membrane | 16 | 0 | 0.21 | 16 | 0.00428 | 0.00428 | 0.00428 |
| 15 | GO:0031225 | anchored component of membrane | 6 | 0 | 0.08 | 17 | 0.00588 | 0.00588 | 0.00588 |
| 16 | GO:0044430 | cytoskeletal part | 152 | 2 | 2.01 | 41 | 0.05708 | 0.00594 | 2.7e-08 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0016021 | integral component of membrane | The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0000000 |
| GO:0016020 | membrane | A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. | 0.0000000 |
| GO:0005576 | extracellular region | The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. | 0.0000000 |
| GO:0016459 | myosin complex | A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. | 0.0000001 |
| GO:0008076 | voltage-gated potassium channel complex | A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. | 0.0000032 |
| GO:0005886 | plasma membrane | The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. | 0.0001593 |
| GO:0098797 | plasma membrane protein complex | Any protein complex that is part of the plasma membrane. | 0.0004918 |
| GO:0031012 | extracellular matrix | A structure lying external to one or more cells, which provides structural support, biochemical or biomechanical cues for cells or tissues. | 0.0008138 |
| GO:0034704 | calcium channel complex | An ion channel complex through which calcium ions pass. | 0.0018495 |
| GO:0005874 | microtubule | Any of the long, generally straight, hollow tubes of internal diameter 12-15 nm and external diameter 24 nm found in a wide variety of eukaryotic cells; each consists (usually) of 13 protofilaments of polymeric tubulin, staggered in such a manner that the tubulin monomers are arranged in a helical pattern on the microtubular surface, and with the alpha/beta axes of the tubulin subunits parallel to the long axis of the tubule; exist in equilibrium with pool of tubulin monomers and can be rapidly assembled or disassembled in response to physiological stimuli; concerned with force generation, e.g. in the spindle. | 0.0022146 |
| GO:0005929 | cilium | A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. | 0.0027889 |
| GO:0042555 | MCM complex | A hexameric protein complex required for the initiation and regulation of DNA replication. | 0.0030211 |
| GO:0005858 | axonemal dynein complex | A dynein complex found in eukaryotic cilia and flagella; the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. | 0.0037087 |
| GO:0045211 | postsynaptic membrane | A specialized area of membrane facing the presynaptic membrane on the tip of the nerve ending and separated from it by a minute cleft (the synaptic cleft). Neurotransmitters cross the synaptic cleft and transmit the signal to the postsynaptic membrane. | 0.0042845 |
| GO:0031225 | anchored component of membrane | The component of a membrane consisting of the gene products that are tethered to the membrane only by a covalently attached anchor, such as a lipid group that is embedded in the membrane. Gene products with peptide sequences that are embedded in the membrane are excluded from this grouping. | 0.0058784 |
| GO:0005887 | integral component of plasma membrane | The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. | 0.0071259 |
showSigOfNodes(GOdata.CC, score(CC.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 68
## Number of Edges = 117
##
## $complete.dag
## [1] "A graph with 68 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)
I want to compare these results with those from 2020-01-14, where I used the \(p\) value of the differential expression analysis to order the genes. I can import the data from 2020-01-14, but I need to do it in a new environment to prevent overwriting the current results.
load(ENRICHMENT, ex <- new.env())
allTerms <- usedGO(GOdata.BP)
BP.pvalue.sigTerms <- with(ex, BP.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% BP.tstat.sigTerms,
PValue = allTerms %in% BP.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(BP.tstat.sigTerms, BP.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(BP.tstat.sigTerms, BP.pvalue.sigTerms)])
## GO:0006814 sodium ion transport
## GO:0006030 chitin metabolic process
## GO:0006486 protein glycosylation
## GO:0005975 carbohydrate metabolic process
## GO:0015074 DNA integration
## GO:0006811 ion transport
## GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules
## GO:0070588 calcium ion transmembrane transport
## GO:0007154 cell communication
## GO:0048870 cell motility
## GO:0007018 microtubule-based movement
## GO:0071805 potassium ion transmembrane transport
## GO:0006313 transposition, DNA-mediated
## GO:0070286 axonemal dynein complex assembly
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(BP.pvalue.sigTerms, BP.tstat.sigTerms)]))
## Term(GOTERM[setdiff(BP.pvalue.sigTerms, BP.tstat.sigTerms)])
## GO:0006468 protein phosphorylation
## GO:0055114 oxidation-reduction process
## GO:0006289 nucleotide-excision repair
## GO:0006355 regulation of transcription, DNA-templated
## GO:0006470 protein dephosphorylation
## GO:0006979 response to oxidative stress
## GO:1901642 nucleoside transmembrane transport
## GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process
## GO:0046942 carboxylic acid transport
BPp.weight01 <- with(ex, BP.weight01)
ggplot(data = data.frame(TStat = rank(score(BP.weight01))[allTerms],
PValue = rank(score(BPp.weight01))[allTerms]),
mapping = aes(x = TStat, y = PValue)) +
geom_point() + geom_smooth() + xlab('Using gene t statistics') +
ylab('Using gene p values') +
ggtitle('Ordering of BP terms by significance')
allTerms <- usedGO(GOdata.MF)
MF.pvalue.sigTerms <- with(ex, MF.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% MF.tstat.sigTerms,
PValue = allTerms %in% MF.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(MF.tstat.sigTerms, MF.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(MF.tstat.sigTerms, MF.pvalue.sigTerms)])
## GO:0043565 sequence-specific DNA binding
## GO:0008061 chitin binding
## GO:0005272 sodium channel activity
## GO:0004888 transmembrane signaling receptor activity
## GO:0008417 fucosyltransferase activity
## GO:0005230 extracellular ligand-gated ion channel activity
## GO:0005328 neurotransmitter:sodium symporter activity
## GO:0051015 actin filament binding
## GO:0003777 microtubule motor activity
## GO:0020037 heme binding
## GO:0004096 catalase activity
## GO:0031409 pigment binding
## GO:0030246 carbohydrate binding
## GO:0004983 neuropeptide Y receptor activity
## GO:0015293 symporter activity
## GO:0008378 galactosyltransferase activity
## GO:0004601 peroxidase activity
## GO:0004190 aspartic-type endopeptidase activity
## GO:0008528 G protein-coupled peptide receptor activity
## GO:0031683 G-protein beta/gamma-subunit complex binding
## GO:0004089 carbonate dehydratase activity
## GO:0005452 inorganic anion exchanger activity
## GO:0070006 metalloaminopeptidase activity
## GO:0022848 acetylcholine-gated cation-selective channel activity
## GO:0030215 semaphorin receptor binding
## GO:0004553 hydrolase activity, hydrolyzing O-glycosyl compounds
## GO:0004435 phosphatidylinositol phospholipase C activity
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(MF.pvalue.sigTerms, MF.tstat.sigTerms)]))
## Term(GOTERM[setdiff(MF.pvalue.sigTerms, MF.tstat.sigTerms)])
## GO:0005525 GTP binding
## GO:0004672 protein kinase activity
## GO:0005524 ATP binding
## GO:0005507 copper ion binding
## GO:0061630 ubiquitin protein ligase activity
## GO:0004198 calcium-dependent cysteine-type endopeptidase activity
## GO:0003924 GTPase activity
## GO:0042626 ATPase activity, coupled to transmembrane movement of substances
## GO:0005337 nucleoside transmembrane transporter activity
## GO:0016787 hydrolase activity
## GO:0016840 carbon-nitrogen lyase activity
## GO:0047631 ADP-ribose diphosphatase activity
## GO:0005506 iron ion binding
MFp.weight01 <- with(ex, MF.weight01)
ggplot(data = data.frame(TStat = rank(score(MF.weight01))[allTerms],
PValue = rank(score(MFp.weight01))[allTerms]),
mapping = aes(x = TStat, y = PValue)) +
geom_point() + geom_smooth() + xlab('Using gene t statistics') +
ylab('Using gene p values') +
ggtitle('Ordering of MF terms by significance')
allTerms <- usedGO(GOdata.CC)
CC.pvalue.sigTerms <- with(ex, CC.pvalue.sigTerms)
vennDiagram(vennCounts(cbind(TStat = allTerms %in% CC.tstat.sigTerms,
PValue = allTerms %in% CC.pvalue.sigTerms)))
# New terms:
as.data.frame(Term(GOTERM[setdiff(CC.tstat.sigTerms, CC.pvalue.sigTerms)]))
## Term(GOTERM[setdiff(CC.tstat.sigTerms, CC.pvalue.sigTerms)])
## GO:0098797 plasma membrane protein complex
## GO:0031012 extracellular matrix
## GO:0034704 calcium channel complex
## GO:0005874 microtubule
## GO:0005929 cilium
## GO:0005858 axonemal dynein complex
## GO:0045211 postsynaptic membrane
## GO:0031225 anchored component of membrane
# Absent terms:
as.data.frame(Term(GOTERM[setdiff(CC.pvalue.sigTerms, CC.tstat.sigTerms)]))
## Term(GOTERM[setdiff(CC.pvalue.sigTerms, CC.tstat.sigTerms)])
## GO:0098800 inner mitochondrial membrane protein complex
## GO:0090575 RNA polymerase II transcription factor complex
CCp.weight01 <- with(ex, CC.weight01)
ggplot(data = data.frame(TStat = rank(score(CC.weight01))[allTerms],
PValue = rank(score(CCp.weight01))[allTerms]),
mapping = aes(x = TStat, y = PValue)) +
geom_point() + geom_smooth() + xlab('Using gene t statistics') +
ylab('Using gene p values') +
ggtitle('Ordering of CC terms by significance')
As expected, ordering genes by the \(t\) statistic produces results different from those when ordering genes by \(p\) value. Some GO terms enriched when using \(p\) values are missed when using \(t\) statistic, typically because those terms are assigned to genes both sub- and overexpressed. See for example the case of the protein phosphorylation process:
showGroupDensity(GOdata.BP, 'GO:0006468', rm.one=FALSE)
Both the genes annotated with that function and their complement display a snake-shaped, bimodal distribution, the former being more extreme. It is not irrelevant, but more difficult to interprete than a term being significant because all their genes behave in a similar way.
save(allgenes2GO,
GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.30.0 variancePartition_1.16.1 scales_1.1.0
## [4] foreach_1.5.0 ggplot2_3.3.0 limma_3.42.2
## [7] knitr_1.28 topGO_2.38.1 SparseM_1.78
## [10] GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2
## [13] S4Vectors_0.24.4 Biobase_2.46.0 graph_1.64.0
## [16] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] bit64_0.9-7 splines_3.6.3 gtools_3.8.2
## [4] assertthat_0.2.1 statmod_1.4.34 highr_0.8
## [7] blob_1.2.1 yaml_2.2.1 progress_1.2.2
## [10] pillar_1.4.3 RSQLite_2.2.0 lattice_0.20-41
## [13] glue_1.4.0 digest_0.6.25 minqa_1.2.4
## [16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-18
## [19] plyr_1.8.6 pkgconfig_2.0.3 purrr_0.3.4
## [22] gdata_2.18.0 BiocParallel_1.20.1 lme4_1.1-23
## [25] tibble_3.0.1 mgcv_1.8-31 farver_2.0.3
## [28] ellipsis_0.3.0 withr_2.2.0 pbkrtest_0.4-8.6
## [31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
## [34] evaluate_0.14 doParallel_1.0.15 nlme_3.1-147
## [37] MASS_7.3-51.6 gplots_3.0.3 tools_3.6.3
## [40] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0
## [43] matrixStats_0.56.0 stringr_1.4.0 munsell_0.5.0
## [46] colorRamps_2.3 compiler_3.6.3 caTools_1.18.0
## [49] rlang_0.4.6 nloptr_1.2.2.1 iterators_1.0.12
## [52] labeling_0.3 bitops_1.0-6 rmarkdown_2.1
## [55] boot_1.3-25 gtable_0.3.0 codetools_0.2-16
## [58] DBI_1.1.0 reshape2_1.4.4 R6_2.4.1
## [61] dplyr_0.8.5 bit_1.1-15.2 KernSmooth_2.23-17
## [64] stringi_1.4.6 Rcpp_1.0.4.6 vctrs_0.2.4
## [67] tidyselect_1.0.0 xfun_0.13
Alexa, Adrian, and Jorg Rahnenfuhrer. 2019. TopGO: Enrichment Analysis for Gene Ontology.
Subramanian, A., P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, et al. n.d. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences of the U.s.a. 102 (43): 15545–50. doi:10.1073兾pnas.0506580102.